Increasing Instruction-Level Parallelism with Instruction Precomputation

نویسندگان

Joshua J. Yi

Resit Sendag

David J. Lilja

چکیده

Value reuse improves a processor’s performance by dynamically caching the results of previous instructions and reusing those results to bypass the execution of future instructions that have the same opcode and input operands. However, continually replacing the least recently used entries could eventually fill the value reuse table with instructions that are not frequently executed. Furthermore, the complex hardware that replaces entries and updates the table may necessitate an increase in the clock period. We propose instruction precomputation to address these issues by profiling programs to determine the opcodes and input operands that have the highest frequencies of execution. These instructions then are loaded into the precomputation table before the program executes. During program execution, the precomputation table is used in the same way as the value reuse table is, with the exception that the precomputation table does not dynamically replace any entries. For a 2K-entry precomputation table implemented on a 4-way issue machine, this approach produced an average speedup of 11.0%. By comparison, a 2K-entry value reuse table produced an average speedup of 6.7%. Instruction precomputation outperforms value reuse, especially for smaller tables, with the same number of table entries while using less area and having a lower access time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Instruction Precomputation: Dynamically Removing Redundant Computations Using Profiling

As a program executes, some computations are performed over and over again. These redundant computations increase the program’s execution time since they require multiple cycles to execute and because they consume limited processor resources. To minimize the performance degradation that redundant computations have on the processor, we propose using Instruction Precomputation hardware to dynamic...

متن کامل

Exploring the Capacity of a Modern SMT Architecture to Deliver High Scientific Application Performance

Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that heterogeneity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threa...

متن کامل

Optimum Instruction-level Parallelism (ILP) for Superscalar and VLIW Processors

Modern superscalar and VLIW processors fetch, decode, issue, execute, and retire multiple instructions per cycle. By taking advantage of instruction-level parallelism (ILP), processor performance can be improved substantially. However, increasing the level of ILP may eventually result in diminishing and negative returns due to control and data dependencies among subsequent instructions as well ...

متن کامل

Global Trade-o between Code Size and Performance for Loop Unrolling on VLIW Architectures

Many media processors 28, 7, 14, 8, 18, 27], used for computing intensive embedded applications, are VLIW architectures that rely on the compiler to exploit Instruction Level Parallelism. Loop unrolling is generally used to expose instruction parallelism but computing the unrolling factor is very diicult as instruction cache misses and spill code can cancel the expected beneet of the transforma...

متن کامل

Instruction Level Parallelism Loop Unrolling

K – Survey of Instruction Set Architectures related to instruction-, data-, thread-, and requestlevel parallelism necessary for understanding Loop unrolling. ILP, Compiler techniques to increase ILP. Register Renaming, Pipeline Scheduling, Loop Unrolling. Conclusion. CPE 731, ILP. 3. Instruction Level Parallelism. 5 Optimizing Program Performance(Loop Unrolling and Enhancing Parallelism ) Michael.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Increasing Instruction-Level Parallelism with Instruction Precomputation

نویسندگان

چکیده

منابع مشابه

Instruction Precomputation: Dynamically Removing Redundant Computations Using Profiling

Exploring the Capacity of a Modern SMT Architecture to Deliver High Scientific Application Performance

Optimum Instruction-level Parallelism (ILP) for Superscalar and VLIW Processors

Global Trade-o between Code Size and Performance for Loop Unrolling on VLIW Architectures

Instruction Level Parallelism Loop Unrolling

عنوان ژورنال:

اشتراک گذاری